Goto

Collaborating Authors

 visual object detection


FreeAnchor: Learning to Match Anchors for Visual Object Detection

Neural Information Processing Systems

Modern CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Unit (IoU). In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner. Our approach, referred to as FreeAnchor, updates hand-crafted anchor assignment to free anchor matching by formulating detector training as a maximum likelihood estimation (MLE) procedure. FreeAnchor targets at learning features which best explain a class of objects in terms of both classification and localization. FreeAnchor is implemented by optimizing detection customized likelihood and can be fused with CNN-based detectors in a plug-and-play manner. Experiments on MS-COCO demonstrate that FreeAnchor consistently outperforms the counterparts with significant margins.


Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos

Biswas, Dipayan, Shah, Shishir, Subhlok, Jaspal

arXiv.org Artificial Intelligence

We introduce the Lecture Video Visual Objects (LVVO) dataset, a new benchmark for visual object detection in educational video content. The dataset consists of 4,000 frames extracted from 245 lecture videos spanning biology, computer science, and geosciences. A subset of 1,000 frames, referred to as LVVO_1k, has been manually annotated with bounding boxes for four visual categories: Table, Chart-Graph, Photographic-image, and Visual-illustration. Each frame was labeled independently by two annotators, resulting in an inter-annotator F1 score of 83.41%, indicating strong agreement. To ensure high-quality consensus annotations, a third expert reviewed and resolved all cases of disagreement through a conflict resolution process. To expand the dataset, a semi-supervised approach was employed to automatically annotate the remaining 3,000 frames, forming LVVO_3k. The complete dataset offers a valuable resource for developing and evaluating both supervised and semi-supervised methods for visual content detection in educational videos. The LVVO dataset is publicly available to support further research in this domain.


Reviews: FreeAnchor: Learning to Match Anchors for Visual Object Detection

Neural Information Processing Systems

I am raising my score to seven. The authors begin by noting that many existing object detection pipelines include a step on'anchor assignment', where from a large set of candidate bounding boxes (or "anchors") in a generic image frame, the one that best matches the ground truth bounding box, as measure by IoU, is chosen to be the one that is used for training, ie the object detection and bounding box regression outputs for that anchor will be pushed towards the ground truth. The authors note that for objects which don't fill the anchor well (slim objects oriented diagonally, objects with holes, or occluded objects) the best anchor according to this IoU comparison may be actively bad for training as a whole. The authors propose "learning to match", ie producing a custom likelihood which promotes both precision and recall of the final result (making reference to terms from the traditional loss function). For each ground truth bounding box, a'bag of anchors' is selected by ranking IoU and picking the best n. During training, a different bounding box is selected from this bag for each object, for each backwards pass.


Reviews: FreeAnchor: Learning to Match Anchors for Visual Object Detection

Neural Information Processing Systems

The paper presents a better loss function for anchor-based detection methods by matching anchors to GT boxes in a differentiable manner. Three reviewers recommend acceptance after a convincing rebuttal. The final decision is to accept.


FreeAnchor: Learning to Match Anchors for Visual Object Detection

Neural Information Processing Systems

Modern CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Unit (IoU). In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner. Our approach, referred to as FreeAnchor, updates hand-crafted anchor assignment to "free" anchor matching by formulating detector training as a maximum likelihood estimation (MLE) procedure. FreeAnchor targets at learning features which best explain a class of objects in terms of both classification and localization. FreeAnchor is implemented by optimizing detection customized likelihood and can be fused with CNN-based detectors in a plug-and-play manner.


FreeAnchor: Learning to Match Anchors for Visual Object Detection

Zhang, Xiaosong, Wan, Fang, Liu, Chang, Ji, Rongrong, Ye, Qixiang

Neural Information Processing Systems

Modern CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Unit (IoU). In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner. Our approach, referred to as FreeAnchor, updates hand-crafted anchor assignment to "free" anchor matching by formulating detector training as a maximum likelihood estimation (MLE) procedure. FreeAnchor targets at learning features which best explain a class of objects in terms of both classification and localization. FreeAnchor is implemented by optimizing detection customized likelihood and can be fused with CNN-based detectors in a plug-and-play manner.


Implementing RoI Pooling in TensorFlow Keras

#artificialintelligence

In this post we explain the basic concept and general usage of RoI (Region of Interest) pooling and provide an implementation using Keras layers and the TensorFlow backend. The intended audience for this post are people familiar with the basic theory of (Convolutional) Neural Networks and who are capable of building and running simple models using Keras. If you are here just for the code, serve yourself from this gist and do not forget to like and share the article! RoI Pooling was proposed by Ross Girshick in the Fast R-CNN paper as part of his object recognition pipeline. In the general use case for RoI Pooling we have an image-like object, and multiple regions of interest specified via bounding boxes.


Crowdsourcing Annotations for Visual Object Detection

Su, Hao (Stanford University) | Deng, Jia (Stanford University) | Fei-Fei, Li (Stanford University)

AAAI Conferences

A large number of images with ground truth object bounding boxes are critical for learning object detectors, which is a fundamental task in compute vision. In this paper, we study strategies to crowd-source bounding box annotations. The core challenge of building such a system is to effectively control the data quality with minimal cost. Our key observation is that drawing a bounding box is significantly more difficult and time consuming than giving answers to multiple choice questions. Thus quality control through additional verification tasks is more cost effective than consensus based algorithms. In particular, we present a system that consists of three simple sub-tasks --- a drawing task, a quality verification task and a coverage verification task. Experimental results demonstrate that our system is scalable, accurate, and cost-effective.